8 research outputs found

    Patrixa: A unification-based parser for Basque and its application to the automatic analysis of verbs

    Get PDF
    In this chapter we describe a computational grammar for Basque, and the first results obtained using it in the process of automatically acquiring subcategorization information about verbs and their associated sentence elements (arguments and adjuncts).In section 1 we describe the Basque syntax and the grammar we have developed for its treatment. The grammar is partial in the sense that it cannot recognize every sentence in real texts, but it is capable of describing the main syntactic elements, such as noun-phrases (NPs), prepositional phrases (PPs), and subordinate and simple sentences. This can be useful for several applications.In section 2 we explain the syntactic analyzer (or parser) used to automatically acquire information on verbal subcategorization from texts. The results will later be used by a linguist or processed by statistical filters.This work has been done by the IXA Natural Language Processing research group, centered on the application of automatic methods to the analysis of Basque

    A Cascaded Syntactic Analyser for Basque

    Get PDF
    This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML

    Learning Argument/Adjunct Distinction for Basque

    No full text
    This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two different statistical filters to acquire the argumental information: Mutual Information, and Fisher's Exact test

    Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing

    Get PDF
    This article describes the different steps in the construction of EPEC (Reference Corpus for the Processing of Basque). EPEC is a corpus of standard written Basque that has been manually tagged at different levels (morphology, surface syntax, phrases) and is currently being hand tagged at deep syntax level following the Dependency Structure-based Scheme. It is aimed to be a "reference" corpus for the development and improvement of several NLP tools for Basque. This corpus has already been used for the construction of some tools such as a morphological analyser, a lemmatiser, or a shallow syntactic analyser

    The Design of a Digital Resource to Store the Knowledge of Linguistic Errors

    No full text
    this paper we present the design of a digital resource which will be used as a repository of information of linguistic errors. As a first step in the design of this database, we made a classification of possible errors. This classification is based on information contained in Basque grammars (Alberdi et al., 2001; Zubiri, 1994) and our previous experience on knowledge representation of language students during their learning process (Daz de Ilarraza et al. 1997). Besides, it has been carried out in collaboration with linguists of our group (http:ixa.si.ehu.es). With the purpose of validating this classification, a questionnaire was presented to experienced Basque teachers and proofreaders from newspapers or publishing houses. With their advice we completed a classification of possible errors. We designed a Zope interface (Zope is a framework for building web applications that lets you connect to external databases (Latteier et al., 2001)) so that linguists and experts in the subject will be able to introduce, through Internet, any error found in a corpus (along with its corresponding information

    GrAF version of the Basque Dependency Treebank

    No full text
    This is the stand-off GrAF version of the Basque Dependency Treebank (BDT). It is the Reference Corpus for the Processing of Basque (EPEC) annotated at syntactic level. EPEC is a 300,000 word corpus of standard written journal texts which aims to be a training corpus for the development and inprovement of several Natural Language Procesing tools. It has been manually tagged at different levels: morphology, partial syntax and semantic This is the stand-off GrAF version of the Constituent Basque Treebank
    corecore